Point Estimate is a single value calculated from a sample of data that is used to estimate the value of an unknown parameter of a population.
Confidence Interval is a range of values that likely contains the true population parameter with a certain level of confidence.
Level of Confidence is a probability that the confidence interval actually captures a true value.
This is a technique used to estimate the sampling distribution of a statistic by repeatedly re-sampling from the observed data with replacement
This is our sample.
\[(-5,0)*--*-*-*-*---(0,0)--\stackrel{\mu}{|}--*--*--*-*-- (5,0)\]
Then randomly pull 3 values, which becomes our data.
\[(-5,0)---*-----(0,0) \stackrel{\mu}{|}----*-----*-- (5,0)\]
Plot our mean then throw those values back in (i.e., replacement) and pull another 3.
\[(-5,0)--------(0,0)----*--*-\stackrel{\mu}{|}-*---- (5,0)\]
Plot that mean and repeat!
Eventually we will have tons of means (we only have 9 in our example here) but in reality we would have hundreds or even thousands huddled around a similar point on our number line.
\[(-5,0)-\stackrel{\mu}{|}----\stackrel{\mu}{|}---(0,0)---\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}---\stackrel{\mu}{|}- (5,0)\]
Once we have these, we have robust estimates of a true mean within a sample.
Well, when we bootstrapped we essentially created an empirical sampling distribution.
\[(-5,0)-\stackrel{\mu}{|}----\stackrel{\mu}{|}---(0,0)---\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}---\stackrel{\mu}{|}- (5,0)\]
And as a result we can identify the percentiles that corresponds to your desired confidence level.
These percentiles from the bootstrap replicates become the lower and upper bounds of what we will call a confidence interval.
Now that we have this sampling distribution we can calculate our 95% confidence interval.
Which simply means, we have a 95% certainty that our population mean will fall between the two red lines.
And 5% of the time, our population mean could fall outside of that zone.
\[(-5,0)-\stackrel{\mu}{|}--\stackrel{\mu}{|}---(0,0)---\color{red}\vert-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\stackrel{\mu}{|}-\color{red}\vert--\stackrel{\mu}{|}- (5,0)\]
\[\text{CI} = \bar{x} \pm z \frac{s}{\sqrt{n}}\]
\(\bar{x}\) = sample mean
\(z\) = confidence level
\(s\) = sample standard deviation
\(n\) = sample size
image: Boston University
A researcher wants to estimate the average number of hours per week spent by employees in a certain company on remote work activities.
They take a random sample of 80 employees and find that the average number of hours spent on remote work per week in the sample is 12 hours, with a standard deviation of 2 hours.
Calculate the 95% confidence interval for the average number of hours spent on remote work activities per week by all employees in the company.
\[\text{CI} = \bar{x} \pm z \frac{s}{\sqrt{n}}\]
With smaller samples (n< 30) the Central Limit Theorem does not apply,so we use a different distribution called the t distribution.
In order to locate our t-value, we have to locate the degrees of freedom.
The number of independent values that are simple allowed to vary within the calculation of a parameter.
\[df = n - 1\]
n = sample size
But why n-1?
Let’s consider an example where 4 people are asked to pick a number that, when combined with the other members, will equal 50.
Person 1 chooses 10
Person 2 chooses 20
Person 3 chooses 10
Person 4 chooses…… Oof they can’t choose anything! They are no longer free to vary.
They have to choose 10 because the we want a sum that equals 50.
Once we have a df and our confidence level, we head to page 373 in our book or head to this link.
A biology class wants to estimate the average length of a certain species of fish in a lake.
They take a random sample of 15 fish and measure their lengths. The sample mean length is 30 cm, and the sample standard deviation is 4 cm.
Calculate the 95% confidence interval for the average length of this species of fish in the lake.
\[\text{CI} = \bar{x} \pm t \frac{s}{\sqrt{n}}\]
Image: giphy